perm filename HAW.DOC[AM,DBL] blob sn#275333 filedate 1977-04-09 generic text, type T, neo UTF8
00001	Here is my paper for the Rule-based Inference conference.  The format
00006	is close to -- but not absolutely identical to -- the one recommended
00011	in Waterman and Hayes-Roth's memo.
00016	
00021	To denote ITALICS, we use the notation &i[I am italicised!!].
00026	To denote underlining, we say &u[I'm underlined.].
00031	
00036	That  is,  ampersand, followed by "i" or "u", followed by the text in
00041	square brackets. Please  note  that  this  paper  appears  very  long
00046	because  it is in fixed-width font; the XGP version is of course much
00051	more compact.   The  name  of  this  file  is  HAW.DOC[c380dl22],  on
00056	CMU-10a.
00061	
00066	I  am in the  process of finalizing  the design of  the new discovery
00071	program (the successor to AM),  so any feedback which you have to the
00076	paper would be very welcome.
00081	
00086	
00091	  --  Doug Lenat
00100	Dear Don and Rick,
00200	
00300	Here are the format conventions for this paper:
00400	
00500	
00600	Italics is indicated as follows:        @i[some italicised text]
00700	Underlining is thus indicated:          @u[this is underlined]
00800	References are in single brackets:      see Waterman[25] for details
00900	Temporary switching to un-italicised font is occasionally also used:
01000	      @i[here in an italicised line is one @r[unitalicised] word]
01100	Index entries are superscripted and in brackets:    [%%249]
01200	Footnotes are superscripted:                        @r[3]
01300	
01400	
01500	Titles and subtitles are marked  as if they were italicised, but feel
01600	free to use various other, bolder, bigger fonts for them.
01700	
01800	The  final reference  (to Waterman, reference  number 25)  may not be
01900	complete. Please try to fill in  any more details you can on it.  All
02000	other references  are complete.   No further  changes need  be  made.
02100	Almost all the reviewer's suggestions have been followed.
02200	
02300	Can't wait until the conference!!
02400	
02500	Regards,
02600	Doug and Greg
     

00100	
00200	
00300	
00400	                        @i[Designing a Rule System That
00500	                      Searches for Scientific Discoveries]
00600	
00700	
00800	
00900	
01000	
01100	
01200	                              @i[Douglas B. Lenat
01300	                                      and
01400	                                Gregory  Harris]
01500	
01600	
01700	                         @i[Computer Science Department
01800	                           Carnegie-Mellon University
01900	                             Pittsburgh, Pa. 15213]
02000	
02100	
02200	
02300	
02400	
02500	
02600	
02700	
02800	
02900	
03000	                          DRAFT   (@i[April 8, 1977])
03100	
03200	
03300	
03400	
03500	
03600	
03700	
03800	
03900	
04000	
04100	
04200	
04300	
04400	
04500	
04600	
04700	
04800	
04900	
05000	
05100	     This  work was  supported  in part  by the  Defense  Advanced Research
05200	     Projects  Agency (F44620-73-C-0074)  and  monitored by  the  Air Force
05300	     Office of Scientific Research.
     

00100	Lenat & Harris                                                             p.  i
00200	
00300	
00400	@i[1.  The Basic Argument]                                                     1
00500	
00600	@i[2.  Early Design Constraints]                                               4
00700	
00800	                                                        [%%190]
00900	@i[3.  @u['AM']: A Rule System For Math Theory Formation        ]              6
01000	@i[3.1.]  Discovery in Mathematics as Heuristic Rule-Guided Search             7
01100	                        [%%165]
01200	@i[3.2.]  Representation        of Mathematical Knowledge                      8
01300	@i[3.3.]  Top-level Control: An Agenda of Promising Questions                  9
01400	@i[3.4.]  Low-level Control: A Lattice of Heuristic Rules                     13
01500	@i[3.5.]  Behavior of this Rule System                                        15
01600	
01700	@i[4.  Reexamining the Design]                                                19
01800	@i[4.1.]  Data Structures                                                     19
01900	               [%%225]
02000	@i[4.2.]  Rules                                                               25
02100	@i[4.3.]  Distribution of Knowledge Between Rules and DS                      31
02200	@i[4.4.]  Interpreter                                                         35
02300	
02400	@i[5.  Speculations for a New Discovery System]                               36
02500	@i[5.1.]  A New Set of Design Constraints                                     37
     

00100	
00200	
00300	
00400	
00500	
00600	
00700	
00800	
00900	
01000	
01100	                                  @i[ABSTRACT]
01200	
01300	
01400	
01500	
01600	
01700	
01800	
01900	
02000	
02100	
02200	
02300	
02400	
02500	     Some   scientific   inference   tasks   (including   mass  spectrum
02600	     identification  [Dendral],  medical  diagnosis  [Mycin],  and  math
02700	     theory development [AM])  have been successfully modelled  as rule-
02800	     directed search processes.   These rule systems are  designed quite
02900	     differently from "pure  production systems". By  concentrating upon
03000	     the  design of  one program  (AM), we  shall show  how 13  kinds of
03100	     design deviations arise from @i[(i)] the level of sophistication of
03200	     the  task that  the  system is  designed to  perform,  @i[(ii)] the
03300	     inherent nature of the task, and @i[(iii)] the  designer's @i[view]
03400	     of  the task.   The  limitations of  AM suggest  even  more radical
03500	     departures from  traditional rule  system architecture.   All these
03600	     modifications are  then collected  into a  new, complicated  set of
03700	     constraints  on the  form of  the data  structures, the  rules, the
03800	     interpreter, and  the distribution of  knowledge between  rules and
03900	     data structures.   These new policies  sacrifice uniformity  in the
04000	     interests  of  clarity,  efficiency  and  power  derivable  from  a
04100	     thorough  characterization  of   the  task.   Rule   systems  whose
04200	     architectures conform  to the  new design  principles will  be more
04300	     awkward for  many tasks than  would "pure"  systems.  Nevertheless,
04400	     the  new architecture  should  be significantly  more  powerful and
04500	     natural  for building  rule  systems that  do  scientific discovery
04600	     tasks.
     

00100	Lenat & Harris                                                             p.  1
00200	
00300	
00400	1.  The Basic Argument
00500	
00600	
00700	Although  rule-based  computation was  originally  used for  formal  and systems
00800	
00900	purposes [Post,Markov,Floyd], researchers in Artificial Intelligence  (AI) found
01000	
01100	that  the same  methodology was  also  useful for  modelling a  wide  variety of
01200	
01300	                    [%%161]
01400	sophisticated tasks.        Many of these early AI rule-based programs -- called
01500	
01600	                    [%%220]
01700	"production systems"        -- served as information processing models of humans
01800	
01900	performing cognitive tasks in  several domains (digit recall [19],  algebra word
02000	
02100	                                  [%%249]
02200	problem solving [1], poker playing        [23], etc. [16,18]).
02300	
02400	
02500	                      [%%260]
02600	There were many design         constraints present in the classical  formal rule
02700	
02800	based  systems.  Many  of  these details  were  preserved in  the  AI production
02900	
03000	    [%%219]
03100	rule        based programs  (e.g., forcing all  state information into  a single
03200	
03300	string of tokens). But there were many changes. The whole notion of "what a rule
03400	
03500	system really is" changed from  an effective problem statement to a  tendency to
03600	
03700	solve problems  in a particular  way.  One typical  corollary of this  change of
03800	
03900	view was  that instead  of @i[no] external  inputs whatsoever,  there was  now a
04000	
04100	@i[presumption] of some "environment" which supplied new entries into  the token
04200	
04300	sequence.   In the  next section  (see  Figure 1)  is an  articulation  of these
04400	
04500	                 [%%223]
04600	@i[neo]-classical        (i.e., AI circa 1973; see [7]) principles for designing
04700	
04800	"pure" production systems.
04900	
05000	
05100	Due  to  the   early  successes,  psychological  applicability,   and  aesthetic
     

00100	Lenat & Harris                                                             p.  2
00200	
00300	
00400	simplicity afforded by  production systems, AI  researchers began to  write rule
00500	
00600	                                                           [%%147]
00700	systems  (RSs)  to  perform  informal  inductive  inference         tasks  (mass
00800	
00900	spectrum identification  [4], medical diagnosis  [23] and  consultation dialogue
01000	
01100	[6],  speech  understanding  [14],  non-resolution  theorem  proving  [0],  math
01200	
01300	research [13], and many more).
01400	
01500	
01600	Yet it seems that  most of the large, successful  RSs have violated many  of the
01700	
01800	"pure production system" guidelines.  The purpose of this paper is to  show that
01900	
02000	such  "exceptions"  were  inevitable, because  any  system  satisfying  the neo-
02100	
02200	classical design constraints, though universal in principle, is too impoverished
02300	
02400	to represent complex tasks for what they are.
02500	
02600	
02700	                                             [%%23]
02800	The essence of the neo-classical architecture       is to opt for  simplicity in
02900	
03000	all things, since  there is very  little one can say  about RSs in  general.  As
03100	
03200	more becomes known about the task of the RS, it turns out that some of  that new
03300	
03400	knowledge takes the form of specific constraints on the design of the  RS itself
03500	
03600	(as distinct  from what specific  knowledge we choose  to represent  within that
03700	
03800	design).  Sometimes  a new  constraint directly  contradicts the  early, domain-
03900	
04000	independent one; sometimes it is  merely a softening or augmentation of  the old
04100	
04200	constraint.
04300	
04400	
04500	After examining the "pure" architecture,  we shall examine in detail  the design
04600	
04700	of one particular rule system which discovers and studies mathematical concepts.
04800	
04900	Deviations from the pure architecture will be both frequent and extreme.
     

00100	Lenat & Harris                                                             p.  3
00200	
00300	
00400	Subsequent sections will analyze these differences.  It will be shown  that each
00500	
00600	one is plausible -- usually for reasons which depend strongly on the "scientific
00700	
00800	discovery" domain of the RS. Some of the limitations of this RS will be treated,
00900	
01000	and  their elimination  will be  seen to  require abandoning  still more  of the
01100	
01200	original design constraints.
01300	
01400	
01500	When these  modifications are  collected, in  the final  section, we  shall have
01600	
01700	quite a  different set of  principles for building  RSs.  Not only  will naivete
01800	
01900	have  been  lost:  so  will  generality  (the  breadth  of  kinds  of  knowledge
02000	
02100	representable, the totality of tractable tasks).  Rule systems conforming to the
02200	
02300	new design will be  awkward for many tasks (just  as a sledge hammer  is awkward
02400	
02500	for cracking  eggs).  However,  they should be  significantly more  powerful and
02600	
02700	natural for scientific inference tasks.
     

00100	Lenat & Harris                                                             p.  4
00200	
00300	
00400	2.  Early Design Constraints
00500	
00600	
00700	By  a @i[rule  system] (RS)  we shall  mean any  collection  of condition-action
00800	
00900	@i[rules],  together  with  associated  @i[data  structures]  (DS;  also  called
01000	
01100	@i[memories])  which the  rules may  inspect and  alter.  There  must also  be a
01200	
01300	                                                                   [%%221]
01400	policy for @i[interpretation]: detecting and firing relevant rules.
01500	
01600	
01700	These definitions are deliberately  left vague.  Many details must  be specified
01800	
01900	for any actual  rule system (e.g.,  What may appear in  the condition part  of a
02000	
02100	                                                                   [%%260]
02200	rule?). This specification process is what we mean by @i[designing]        a RS.
02300	
02400	
02500	                                                  [%%54]
02600	Figure 1  contains an  articulation of  the design        of the  early general-
02700	
02800	purpose AI production  rule systems.  Notice the  common theme: the  adequacy of
02900	
03000	simplicity in all dimensions.
03100	
03200	
03300	                                                             [%%23]
03400	             FIGURE 1: Neo-classical Rule System Architecture
03500	
03600	       @r[1. ]Principle  of  Simple  Memories.   One  or  two  uniform data
03700	                   [%%142]
03800	         structures        define sufficient memories for a rule  system to
03900	                                                          [%%101]
04000	         read from and write  into. The format for entries         in these
04100	         structures is both uncomplicated and unchanging.
04200	
04300	                                              [%%74]
04400	       @r[2. ]Principle of Simple DS Accesses.       The primitive read and
04500	         write  operations  are  as  simple  and  low-level   as  possible;
04600	         typically they are simply  a membership-test type of read,  and an
04700	         insert-new-element type  of write.  More  complicated, algorithmic
04800	         operations on the memories are not available to the rules.
04900	
05000	       @r[3. ]Principle of Isolated  DS Elements.  Elements of  the uniform
05100	         DS cannot point to  (parts of) other elements.  This  follows from
05200	         the  preceding  principle:  if  we  aren't  allowed  to  @u[chase]
05300	         pointers, there may as well not be any.
     

00100	Lenat & Harris                                                             p.  5
00200	
00300	
00400	       @r[4. ]Principle of Continuous Attention. In addition to the  one or
00500	         two simple data structures,  there may be an  external environment
00600	         which continuously inserts stimuli into the DS.   The interleaving
00700	         of  stimuli  and  internally generated  symbols  is  managed quite
00800	         trivially: (a) The stimuli are simply inserted into the DS  as new
00900	         elements;  (b)  Each   rule  is  so   small  and  quick   that  no
01000	                                                                [%%159]
01100	         "interruption" mechanism is necessary.  The interpreter        may
01200	         ignore any suddenly-added stimulus until the current rule finishes
01300	         executing.  The  RS may be  viewed as "continuously"  attending to
01400	         the environment.
01500	
01600	       @r[5. ]Principle  of Opaque  Rules.  Rules  need not  have  a format
01700	         inspectable by other  rules, but rather  can be coded  in whatever
01800	         way is  convenient for  the programmer  and the  rule interpreter;
01900	         i.e., the set of rules is  @u[not] treated as one of the  RSs data
02000	         structures.  E.g., the condition parts of rules may be barred from
02100	         fully analyzing the set of productions [22], and the  action parts
02200	         of rules may not be allowed to delete existing rules [24].
02300	
02400	       @r[6. ]Principle of Simple  Rules.  Rules consist  of a left-  and a
02500	         right-hand side  which are  quite elementary:  The left  hand side
02600	         (lhs, situation characterization, IF-part, condition) is typically
02700	                        [%%64]
02800	         a pattern-match        composed with a  primitive DS  read access,
02900	                                                                      [%%5]
03000	         and the right hand side (rhs, consequence, THEN-part, action)
03100	         is also simply a primitive DS write access.  There is no  need for
03200	         sophisticated bundles  of DS  accesses on either  side of  a rule.
03300	         Thus several extra rules should be preferred to a single rule with
03400	         several actions.
03500	
03600	       @r[7. ]Principle  of Encoding  by  Coupled Rules.   A  collection of
03700	         interrelated  rules  is  used to  accomplish  each  subtask; i.e.,
03800	         wherever a  subroutine would be  used in a  procedural programming
03900	         language.  For example, programming an iteration may  require many
04000	         rules "coupled"  by writing and  reading special  (i.e., otherwise
04100	         meaningless) loop control notes in the data structure.
04200	
04300	       @r[8. ]Principle of Knowledge as Rules.  All knowledge  of substance
04400	         should be, can be, and is represented as rules.  This includes all
04500	         non-trivial domain-dependent information.   The role of the  DS is
04600	         just to hold simple descriptive information,  intermediate control
04700	         state messages, recent stimuli from the environment, etc.
04800	
04900	       @r[9. ]Principle of Simple Interpretation. The topmost  control flow
05000	                                                      [%%159]
05100	         in the RS  is via a simple  rule interpreter.        After  a rule
05200	         fires, it is essential that any rule in the system may potentially
     

00100	Lenat & Harris                                                             p.  6
00200	
00300	
00400	         be the next one to fire (i.e., it is forbidden to locate a  set of
00500	         relevant rules and fire them off in sequence).  When the rhs  of a
00600	         rule is executed, it  can (and frequently will)  drastically alter
00700	         the situation that determined which rules were relevant.
00800	
00900	       @r[10. ]Principle of Closure.  The representations allowed  by (1-9)
01000	         are sufficient  and appropriate  for organizing  all the  kinds of
01100	         knowledge needed for tasks for which a given RS is designed.
01200	
01300	
01400	
01500	This design was  plausible @i[a priori], and  worked quite well for  its initial
01600	
01700	applications (the  simulation of simple  human cognitive  processes [16,19,24]).
01800	
01900	But is this design proper for  @i[any] RS, regardless of its intended  task?  In
02000	
02100	particular,  what about  scientific inference  tasks?  Over  the  years, several
02200	
02300	rule-based inference  systems for scientific  tasks have been  constructed. With
02400	
02500	each new success have come  some deviations from the above principles  [7]. Were
02600	
02700	these  mere aberrations,  or is  there  some valid  reason for  such  changes in
02800	
02900	design?
03000	
03100	
03200	We claim the latter.  The task domain -- scientific discovery -- dictates  a new
03300	
03400	and quite  different architecture for  RSs. To study  this phenomenon,  we shall
03500	
03600	describe, in the next section, one particular RS which defines  new mathematical
03700	
03800	concepts, studies them,  and conjectures relationships between  them. Subsequent
03900	
04000	sections  will  explore the  deviations  of its  design  from  the neo-classical
04100	
04200	constraints in Figure 1.
04300	
04400	
04500	                                                     [%%190]
04600	3.  @u['AM']: A Rule System For Math Theory Formation
04700	
04800	
04900	A recent thesis [13] describes a program, called "AM", which gradually expands a
05000	
05100	base of mathematical  knowledge.  The representation  of math facts  is somewhat
     

00100	Lenat & Harris                                                             p.  7
00200	
00300	
00400	                 [%%7]
00500	related to Actors      [10] and  Beings [12] in the partitioning of  such domain
00600	
00700	knowledge into  effective, structured modules.   Departing from  the traditional
00800	
00900	                                                                         [%%116]
01000	control  structures usually  associated with  Actors, Beings,  and Frames
01100	
01200	[15], AM concentrates on one "interesting" mini-research question after another.
01300	
01400	                            [%%130]
01500	These  "jobs"  are  proposed         by  -- and  rated  by  --  a  collection of
01600	
01700	approximately 250 situation-action rules.  Discovery in mathematics  is modelled
01800	
01900	in AM  as a rule-guided  exploration process.  This  view is explained  below in
02000	
02100	Section 3.1 (See also [21].)  The representation of knowledge is  sketched next,
02200	
02300	followed by a much more detailed description of the rule-based control structure
02400	
02500	of AM.   Finally, in Section  3.5, the experimental  results of the  project are
02600	
02700	summarized.
02800	
02900	
03000	3.1.  Discovery in Mathematics as Heuristic Rule-Guided Search
03100	
03200	
03300	                                                                         [%%190]
03400	The task which AM performs  is the discovery of new  mathematics concepts
03500	
03600	and relationships between them.  The simple paradigm it follows for this task is
03700	
03800	                                                   [%%71]
03900	to maintain a graph of partially-developed concepts       , and to obey  a large
04000	
04100	                          [%%166]
04200	collection of "heuristics"         (rules which frequently lead  to discoveries)
04300	
04400	which guide it to define and study the most plausible thing next.
04500	
04600	
04700	For example, at one point AM had some notions of sets,  set-operations, numbers,
04800	
04900	and  simple  arithmetic.  One  heuristic  rule  it  knew  said  @i["If  f  is an
05000	
05100	interesting relation, Then look at its inverse"].  This rule fired after  AM had
     

00100	Lenat & Harris                                                             p.  8
00200	
00300	
00400	studied "multiplication" for a  while. The rhs of  the rule then directed  AM to
00500	
00600	define   and  study   the  relation   "divisors-of"  (e.g.,   divisors-of(12)  =
00700	
00800	{1,2,3,4,6,12}).  Another heuristic  rule which later fired  said @i["If f  is a
00900	
01000	relation from A into B, then  it's worth examining those members of A  which map
01100	
01200	into @u[extremal] members of B"].  In this case, f was matched to "divisors-of",
01300	
01400	A was "numbers", B was "sets of numbers", and an extremal member of B  might be,
01500	
01600	e.g., a very @i[small]  set of numbers.  Thus  this heuristic rule caused  AM to
01700	
01800	define the  set of  numbers with  no divisors, the  set of  numbers with  only 1
01900	
02000	divisor, with only 2 divisors, etc.  One of these sets (the last  one mentioned)
02100	
02200	turned out subsequently to be  quite important; these numbers are of  course the
02300	
02400	primes.   The  above heuristic  also  directed  AM to  study  numbers  with very
02500	
02600	@i[many]  divisors;  such  highly-composite  numbers  were  also  found   to  be
02700	
02800	interesting.
02900	
03000	
03100	                                                    [%%69]
03200	This same paradigm  enabled AM to discover  concepts       which were  much more
03300	
03400	primitive (e.g., cardinality) and much more sophisticated (e.g., the fundamental
03500	
03600	theorem of arithmetic) than prime numbers.  We shall now describe the AM program
03700	
03800	in more detail.
03900	
04000	
04100	                    [%%165]
04200	3.2.  Representation        of Mathematical Knowledge
04300	
04400	
04500	What exactly does it  mean for AM to "have  the notion of" a concept?   It means
04600	
04700	                                 [%%116]
04800	that  AM possesses  a  frame-like        data  structure for  that  concept. For
04900	
05000	                                 [%%71]
05100	instance, here is how one concept       looked after AM had defined and explored
05200	
05300	it:
     

00100	Lenat & Harris                                                             p.  9
00200	
00300	
00400	                          FIGURE 2: A Typical Concept
00500	
00600	    NAME: Prime Numbers
00700	    DEFINITIONS:
00800	               ORIGIN: Number-of-divisors-of(x) = 2
00900	               PREDICATE-CALCULUS: Prime(x)  iff  
01000	                                   (For-all z) (z|x  implies   z=1  XOR  z=x)
01100	               ITERATIVE: (for x>1): For i from 2 to x-1,  not (i|x)
01200	    EXAMPLES: 2, 3, 5, 7, 11, 13, 17
01300	               BOUNDARY: 2, 3
01400	               BOUNDARY-FAILURES: 0, 1
01500	               FAILURES: 12
01600	    GENERALIZATIONS: Numbers, Numbers with an even number of divisors,
01700	                               Numbers with a prime number of divisors
01800	    SPECIALIZATIONS: Odd Primes, Prime Pairs, Prime Uniquely-addables
01900	    CONJECS: Unique factorization, Goldbach's conjecture, Extrema of Divisors-of
02000	    ANALOGIES: Maximally-divisible numbers are converse extremes of Divisors-of
02100	    INTEREST: Conjec's tying Primes to Times, to Divisors-of, to closely related
02200	    WORTH: 800
02300	
02400	
02500	
02600	3.3.  Top-level Control: An Agenda of Promising Questions
02700	
02800	
02900	AM was initially given a collection of 115 core concepts, with only a few facets
03000	
03100	(i.e.,  slots) filled  in for  each. AM  repeatedly chooses  some facet  of some
03200	
03300	concept, and tries to fill in some entries for that particular slot.   To decide
03400	
03500	which such job  to work on  next, AM maintains an  @i[agenda] of jobs,  a global
03600	
03700	queue  ordered  by priority  [2].   A  typical job  is  @i["Fill-in  examples of
03800	
03900	Primes"].  The agenda  may contain  hundreds of  entries such  as this  one.  AM
04000	
04100	repeatedly selects the top job from the agenda and tries to carry it  out.  This
04200	
04300	is the whole control structure! Of course, we must still explain how  AM creates
04400	
04500	plausible new jobs to place on the agenda, how AM decides which job will  be the
04600	
04700	best one to execute next, and how it carries out a job.
04800	
04900	
05000	If the job were @i["Fill in new Algorithms for Set-union"],  then @i[satisfying]
     

00100	Lenat & Harris                                                            p.  10
00200	
00300	
00400	it would  mean actually  synthesizing some  new procedures,  some new  LISP code
00500	
00600	capable of forming the union of any two sets.  A heuristic rule  is @i[relevant]
00700	
00800	to a job if and only if executing that rule brings AM closer to  satisfying that
00900	
01000	job. Potential relevance is determined @i[a priori] by where the rule is stored.
01100	
01200	A  rule tacked  onto the  Domain/range  facet of  the Compose  concept  would be
01300	
01400	presumed potentially relevant to the job @i["Fill in the Domain of Insert-]o@i[-
01500	
01600	Delete"]. The lhs  of each potentially relevant  rule is evaluated  to determine
01700	
01800	whether the rule is truly relevant.
01900	
02000	
02100	Once a job is  chosen from the agenda,  AM gathers together all  the potentially
02200	
02300	                         [%%78]
02400	relevant  heuristic rules        -- the  ones which  might accomplish  that job.
02500	
02600	They are  executed, and then  AM picks a  new job.  While  a rule  is executing,
02700	
02800	three kinds of actions or effects can occur:
02900	
03000	@i[(i)] Facets of some concepts can get filled in (e.g., examples of  primes may
03100	    actually  be found  and tacked  onto the  "Examples" facet  of  the "Primes"
03200	    concept).  A typical heuristic rule which might have this effect is:
03300	
03400	        If examples  of X are  desired, where X  is a kind  of Y (for  some more
03500	                general concept Y),
03600	        Then check the examples of Y; some of them may be examples of X as well.
03700	
03800	    For the job of filling in examples of Primes, this rule would have AM notice
03900	    that Primes  is a  kind of  Number, and  therefore look  over all  the known
04000	    examples of Number. Some of those would be primes, and would  be transferred
04100	    to the Examples facet of Primes.
04200	
04300	@i[(ii)]  New concepts  may  be created  (e.g.,  the concept  "primes  which are
04400	    uniquely representable  as the sum  of two other  primes" may be  somehow be
04500	    deemed worth studying).  A typical heuristic rule which might result in this
04600	    new concept is:
04700	
04800	        If some (but not  most) examples of X are  also examples of Y  (for some
04900	                concept Y),
05000	        Then  create  a new  concept  defined  as the  intersection  of  those 2
05100	                concepts (X and Y).
     

00100	Lenat & Harris                                                            p.  11
00200	
00300	
00400	    Suppose AM has  already isolated the concept  of being representable  as the
00500	    sum of two primes in only one way (AM actually calls such numbers "Uniquely-
00600	    prime-addable numbers").  When AM notices that some primes are in  this set,
00700	    the  above rule  will create  a brand  new concept,  defined as  the  set of
00800	    numbers which are both prime and uniquely prime addable.
00900	
01000	@i[(iii)] New jobs may  be added to the  agenda (e.g., the current  activity may
01100	    suggest that the following job is worth considering: "Generalize the concept
01200	    of prime numbers").  A typical  heuristic rule which might have  this effect
01300	    is:
01400	
01500	        If very few examples of X are found,
01600	        Then add the following job to the agenda: "Generalize the concept X".
01700	
01800	
01900	The concept of an agenda is certainly not new: schedulers have been around for a
02000	
02100	long time.  But one important feature  of AM's agenda scheme @i[is] a  new idea:
02200	
02300	attaching -- and  using -- a  list of quasi-symbolic  reasons to each  job which
02400	
02500	explain  why  the job  is  worth considering,  why  it's plausible.   It  is the
02600	
02700	responsibility  of the  heuristic rules  to include  reasons for  any  jobs they
02800	
02900	propose. For example, let's reconsider the heuristic rule mentioned in @i[(iii)]
03000	
03100	above.  It really looks more like the following:
03200	
03300	        If very few examples of X are found,
03400	        Then  add  the  following job  to  the  agenda:  "Generalize the
03500	                concept  X", for  the following  reason: "X's  are quite
03600	                rare; a slightly less restrictive concept might  be more
03700	                interesting".
03800	
03900	
04000	If the same job is proposed by several rules, then several different reasons for
04100	
04200	it may  be present.  In  addition, one ephemeral  reason also exists:  "Focus of
04300	
04400	attention" [9]. Any jobs which are  related to the one last executed  get "Focus
04500	
04600	of attention" as  a bonus reason.   AM uses all these  reasons to decide  how to
04700	
04800	rank the jobs on  the agenda.  Each reason is  given a rating (by  the heuristic
04900	
05000	which proposed it), and the ratings are combined into an overall priority rating
     

00100	Lenat & Harris                                                            p.  12
00200	
00300	
00400	for each  job on the  agenda. The jobs  are ordered by  these ratings, so  it is
00500	
00600	trivial to select the job with the highest rating. Note that if a job already on
00700	
00800	the agenda is re-proposed for a new reason, then its priority will  increase. If
00900	
01000	the  job is  re-proposed  for an  already-present reason,  however,  the overall
01100	
01200	rating of  the job will  @i[not] increase.  This turned out  to be  an important
01300	
01400	enough  phenomenon  that  it  was  presented  in  [13]  as  a  necessary  design
01500	
01600	constraint.
01700	
01800	
01900	AM uses each job's list of reasons in other ways. Once a job has  been selected,
02000	
02100	the quality of  the reasons is used  to decide how much  time and space  the job
02200	
02300	will be permitted to absorb, before AM quits and moves on to a new job.  Another
02400	
02500	use is to explain  to the human observer precisely  why the chosen top job  is a
02600	
02700	plausible thing for AM to concentrate upon.
     

00100	Lenat & Harris                                                            p.  13
00200	
00300	
00400	3.4.  Low-level Control: A Lattice of Heuristic Rules
00500	
00600	
00700	The hundreds of concepts AM  possesses are interrelated in many ways.   One main
00800	
00900	                                                     [%%120]
01000	organization is that provided by their Generalization         and Specialization
01100	
01200	facets. The concepts may be viewed  as nodes on a large lattice whose  edges are
01300	
01400	labelled  Genl/Spec.  The importance  of  this organization  stems  from various
01500	
01600	                [%%138]
01700	@i[heritability]         properties.  For  example, Spec  is transitive,  so the
01800	
01900	specializations   of  Numbers   include  not   only  Primes   but   all  @i[its]
02000	
02100	specializations as well.
02200	
02300	
02400	Let us describe a second, very important heritability property.  Each of the 250
02500	
02600	heuristic rules is attached to the most general (or abstract) concept  for which
02700	
02800	it is  deemed appropriate.  The  relevance of heuristic  rules is assumed  to be
02900	
03000	inherited by all its specializations.  For example, a heuristic method  which is
03100	
03200	capable of inverting  any function will be  attached to the  concept "Function";
03300	
03400	but it is certainly also capable  of inverting any permutation. If there  are no
03500	
03600	known methods specific  to the latter  job, then AM  will follow the  Genl links
03700	
03800	upward  from  Permutation  to  Bijection  to  Function...,  seeking  methods for
03900	
04000	inversion.  Of course the more general concepts' methods tend to be  weaker than
04100	
04200	those of the specific concepts.
04300	
04400	
04500	In other words, the Genl/Spec  graph of concepts induces a graph  structure upon
04600	
04700	the  set of  heuristic  rules. This  permits  potentially relevant  rules  to be
04800	
04900	located efficiently. Here is one more example of how this heritability  works in
05000	
05100	practice:  Immediately  after the  job  "Fill in  examples  of  Set-equality" is
     

00100	Lenat & Harris                                                            p.  14
00200	
00300	
00400	chosen, AM asks each generalization  of Set-equality for help. Thus it  asks for
00500	
00600	ways  to fill  in examples  of  any Predicate,  any Activity,  any  Concept, and
00700	
00800	finally for ways to fill in examples of Anything.  One such heuristic rule known
00900	
01000	to the Activity concept  says: @i["If examples of  the domain of the  activity f
01100	
01200	are  already  known, Then  actually  execute f  on  some random  members  of its
01300	
01400	domain."] Thus when AM applies this  heuristic rule to fill in examples  of Set-
01500	
01600	equality, its Domain facet is inspected, and AM notes that Set-equality  takes a
01700	
01800	pair  of sets  as its  arguments. Then  AM accesses  the Examples  facet  of the
01900	
02000	concept Set, where it finds a large list of sets. The lhs is thus  satisfied, so
02100	
02200	the rule is fired.   Obeying the heuristic rule,  AM repeatedly picks a  pair of
02300	
02400	the known  sets at random,  and sees if  they satisfy Set-equality  (by actually
02500	
02600	running  the LISP  function  stored in  the Algorithms  facet  of Set-equality).
02700	
02800	While this will typically return False, it will occasionally locate -- by random
02900	
03000	chance -- a pair of equal sets.
03100	
03200	
03300	Other  heuristics, tacked  onto other  generalizations of  Set-equality, provide
03400	
03500	additional methods for executing the  job "Fill in examples of  Set-equality." A
03600	
03700	heuristic stored on the concept Any-concept says to symbolically instantiate the
03800	
03900	definition. After  spending much time  manipulating the recursive  definition of
04000	
04100	Set-equality, a few trivial examples (like {}={}) are produced.  Notice that (as
04200	
04300	expected) the more general the concept is, the weaker (more time-consuming, less
04400	
04500	chance for  success) its heuristics  tend to be.   For this reason,  AM consults
04600	
04700	each concept's rules in order of increasing generalization.
     

00100	Lenat & Harris                                                            p.  15
00200	
00300	
00400	3.5.  Behavior of this Rule System
00500	
00600	
00700	As  the preceding  four sections  indicate, the  dynamic behavior  of AM  was as
00800	
00900	follows: a job is chosen from the agenda, potentially relevant rules are located
01000	
01100	by their position  in the Genl/Spec lattice,  their lhs's (left-hand  sides) are
01200	
01300	evaluated to find those which actually trigger, they are then executed (in order
01400	
01500	of decreasing specificity) until they are all executed (or until some @i[local],
01600	
01700	self-imposed limit on time or space is exceeded), and the cycle repeats.  AM has
01800	
01900	a modest  facility that  prints out a  description of  these activities  as they
02000	
02100	occur. Here is a tiny excerpt of this self-trace monologue.
02200	
02300	** @u[Job 65:] **   Fill in Examples of the concept "Divisors-of".
02400	
02500	    3 Reasons: (1) No known examples of Divisors-of so far.
02600	             (2) TIMES, which is related to Divisors-of, is now very interesting
02700	             (3) Focus of attention: AM recently defined Divisors-of.
02800	
02900	26 examples found, in 9.2 seconds. e.g., Divisors-of(6)={1 2 3 6}.
03000	
03100	
03200	** @u[Job 66:] **  Consider numbers having small sets of Divisors-of.
03300	
03400	    2 Reasons: (1) Worthwhile to look for extreme cases.
03500	             (2) Focus of attention: AM recently worked on Divisors-of.
03600	
03700	Filling in examples of numbers with 0 divisors.
03800	   0 examples found, in 4.0 seconds.
03900	              [%%144]
04000	   Conjecture:        no numbers have precisely 0 divisors.
04100	
04200	Filling in examples of numbers with 1 divisors.
04300	   1 examples found, in 4.0 seconds. e.g., Divisors-of(1) = {1}.
04400	              [%%173]
04500	   Conjecture:        1 is the only number with precisely 1 divisor.
04600	
04700	Filling in examples of numbers with 2 divisors.
04800	   24 examples found, in 4.0 seconds. e.g., Divisors-of(13) = {1 13}.
04900	   No obvious conjecture. May merit more study.
05000	   Creating a new concept: "Numbers-with-2-divisors".
05100	
05200	Filling in examples of numbers with 3 divisors.
05300	   11 examples found, in 4.0 seconds. e.g., Divisors-of(49) = {1 7 49}.
     

00100	Lenat & Harris                                                            p.  16
00200	
00300	
00400	   All numbers with 3 divisors are also Squares. Definitely merits more study.
00500	   Creating a new concept: "Numbers-with-3-divisors".
00600	
00700	
00800	** @u[Job 67:] **  Consider the square-roots of Numbers-with-3-divisors.
00900	
01000	    2 Reasons: (1) Numbers-with-3-divisors are unexpectedly also Perfect Squares
01100	             (2) Focus of attention: AM recently worked on Nos-with-3-divisors.
01200	
01300	   All square-roots of Numbers-with-3-divisors seem to be Numbers-with-2-divisors
01400	        e.g.,  Divisors-of(Square-root(169)) = Divisors-of(13) = {1 13}.
01500	   Even the converse of this seems empirically to be true.
01600	        i.e.,  the square of each No-with-2-divisors seems to be a No-with-3-divs
01700	        The chance of coincidence is below acceptable limits.
01800	   Boosting the interestingness rating of each of the concepts involved.
01900	
02000	
02100	** @u[Job 68:] **  Consider the squares of Numbers-with-3-divisors.
02200	
02300	    3 Reasons: (1) Squares of Numbers-with-2-divisors were interesting.
02400	             (2) Square-roots of Numbers-with-3-divisors were interesting.
02500	             (3) Focus of attention: AM recently worked on Nos-with-3-divisors.
02600	
02700	
02800	
02900	
03000	Now that we've  seen how AM works,  and we've been exposed  to a bit  of "local"
03100	
03200	results, let's take a moment to discuss the totality of the mathematics which AM
03300	
03400	carried out.   AM began its  investigations with scanty  knowledge of  a hundred
03500	
03600	elementary  concepts of  finite set  theory. Most  of the  obvious set-theoretic
03700	
03800	concepts  and  relationships  were  quickly  found  (e.g.,  de   Morgan's  laws;
03900	
04000	singletons),   but   no  sophisticated   set   theory  was   ever   done  (e.g.,
04100	
04200	diagonalization). Rather, AM discovered  natural numbers and went  off exploring
04300	
04400	elementary   number  theory.    Arithmetic  operations   were  soon   found  (as
04500	
04600	       [%%15]
04700	analogs       to set-theoretic operations),  and AM made surprising  progress in
04800	
04900	divisibility   theory.   Prime   pairs,   Diophantine   equations,   the  unique
05000	
05100	factorization of numbers into  primes, Goldbach's conjecture -- these  were some
     

00100	Lenat & Harris                                                            p.  17
00200	
00300	
00400	of the nice discoveries  by AM. Many concepts which  we know to be  crucial were
00500	
00600	                                    @r[1]
00700	never uncovered,  however: remainder     ,  gcd, greater-than,  infinity, proof,
00800	
00900	etc.
01000	
01100	
01200	All  the  discoveries  mentioned  were  made  in  a  run  lasting  one  cpu hour
01300	
01400	(Interlisp+100k, SUMEX PDP-10 KI). Two  hundred jobs in toto were  selected from
01500	
01600	the agenda and executed. On the  average, a job was granted 30 cpu  seconds, but
01700	
01800	actually used only 18 seconds.  For  a typical job, about 35 rules  were located
01900	
02000	as potentially  relevant, and about  a dozen actually  fired. AM began  with 115
02100	
02200	concepts and ended up with three times that many.  Of the  synthesized concepts,
02300	
02400	half were technically termed "losers" (both  by the author and by AM),  and half
02500	
02600	the remaining ones were of only marginal interest.
02700	
02800	
02900	Although AM fared  well according to  several different measures  of performance
03000	
03100	(see Section  7.1 in  [13]), of greatest  significance are  its @i[limitations].
03200	
03300	This  subsection will  merely report  them, and  the next  section  will analyze
03400	
03500	whether  they  were  caused   by  radical  departures  from   the  neo-classical
03600	
03700	production-system architecture, or from departing not far enough from that early
03800	
03900	design.
04000	
04100	
04200	As AM ran longer  and longer, the concepts  it defined were further  and further
04300	
04400	from  the  primitives  it  began with.  Thus  "prime-pairs"  were  defined using
04500	
04600	"primes" and  "addition", the  former of which  was defined  from "divisors-of",
04700	
04800	
04900	---------------
05000	1
05100	  This  concept,  and  many  of  the  other  "omissions",  @u[could]  have  been
05200	discovered by  the existing heuristic  rules in AM.  The paths which  would have
05300	resulted in their definition were simply never rated high enough to explore.
     

00100	Lenat & Harris                                                            p.  18
00200	
00300	
00400	which in turn came from "multiplication", which arose from "addition", which was
00500	
00600	defined as a  restriction of "union", which  (finally!) was a  primitive concept
00700	
00800	(with heuristics)  that we had  supplied to AM  initially. When  AM subsequently
00900	
01000	needed help with prime pairs, it  was forced to rely on rules of  thumb supplied
01100	
01200	originally about @i[union]ing. Although the heritability property  of heuristics
01300	
01400	did ensure that those rules were still valid, the trouble was that they were too
01500	
01600	general, too weak to deal effectively with the specialized notions of primes and
01700	
01800	arithmetic.  For instance,  one general rule indicated  that A union B  would be
01900	
02000	interesting  if it  possessed properties  absent both  from A  and from  B. This
02100	
02200	translated into the prime-pair case as "@i[If p+q=r, and p,q,r are  primes, Then
02300	
02400	r is interesting if it has properties  not possessed by p or by q.]"  The search
02500	
02600	for categories of such interesting primes @i[r] was of course barren.  It showed
02700	
02800	a fundamental lack of understanding about numbers, addition,  odd/even-ness, and
02900	
03000	primes.
03100	
03200	
03300	As the derived concepts moved further away from finite set theory,  the efficacy
03400	
03500	of the initial  heuristics decreased.  AM began  to "thrash", appearing  to lose
03600	
03700	most of  its heuristic  guidance. It  worked on  concepts like  "prime triples",
03800	
03900	which is not a rational thing  to investigate.  The key deficiency was  the lack
04000	
04100	of  adequate  @i[meta]-rules[6]:   heuristics  which  cause  the   creation  and
04200	
04300	modification of new heuristics.
04400	
04500	
04600	Aside from the preceding major limitation, most of the other problems pertain to
04700	
04800	missing knowledge. Many concepts one  might consider basic to discovery  in math
04900	
05000	are  absent  from  AM; analogies  were  under-utilized;  physical  intuition was
05100	
05200	absent; the interface to the user was far from ideal; etc.
     

00100	Lenat & Harris                                                            p.  19
00200	
00300	
00400	4.  Reexamining the Design
00500	
00600	
00700	Let us now  consider the major  components of a RS's  design and how  AM treated
00800	
00900	them: the DS, the rules, the distribution of knowledge between DS and rules, and
01000	
01100	the rule interpretation policy.  For each component, AM's architecture failed to
01200	
01300	adhere strictly to the pure RS guidelines.  Were these departures worth the loss
01400	
01500	of  simplicity?  Were  the  deviations  due  to  the  task   domain  (scientific
01600	
01700	discovery),  to  the  task  view  (heuristically  guided  growth  of  structured
01800	
01900	theories),  or to  other sources?   These are  the kinds  of questions  we shall
02000	
02100	address in each of the following subsections.
02200	
02300	
02400	4.1.  Data Structures
02500	
02600	
02700	We recognize that a single uniform DS (e.g., an infinite STM [19])  is universal
02800	
02900	in  the  Turing  sense  of  being  @i[formally]  adequate:  One  can  encode any
03000	
03100	representation  in a  linear, homogeneous  DS.  The  completeness of  such  a DS
03200	
03300	design not withstanding, we believe that encouraging several  distinct, special-
03400	
03500	purpose DSs will enhance the performance of a discovery system.  That is, we are
03600	
03700	                                                                     [%%100]
03800	willing to sacrifice aesthetic purity of DSs for clarity, efficiency,        and
03900	
04000	power.  In this section we will explore this tradeoff.
04100	
04200	
04300	The data structures used in AM are unlike the uniform memories suggested  by the
04400	
04500	first design constraint (see Figure 1). One DS -- the agenda -- holds an ordered
04600	
04700	list of plausible questions for the system to concentrate on, a list of  jobs to
04800	
04900	work on.   Another DS is  the graph  of concepts AM  knows about.   Each concept
05000	
05100	itself consists in much structured  information (see Figure 2).  The  reasons AM
     

00100	Lenat & Harris                                                            p.  20
00200	
00300	
00400	has for each job have information associated with them.  Still other information
00500	
00600	is present as values of  certain functions and global variables: the  cpu clock,
00700	
00800	the total number of concepts, the last thing typed out to the user, the last few
00900	
01000	concepts worked  on, etc.  All  these types of  information are accessed  by the
01100	
01200	lhs's  (left  hand  sides)  of heuristic  rules,  and  affected  by  rhs's (some
01300	
01400	"deliberately" in the text of  the rule, some "incidentally" through a  chain of
01500	
01600	if-added methods).
01700	
01800	
01900	Why is there this multitude of  diverse DSs? Each type of knowledge  (jobs, math
02000	
02100	knowledge,  system status)  needs to  be treated  quite differently.   Since the
02200	
02300	primitive operations will vary with  the type of information, so should  the DS.
02400	
02500	For @u[jobs], the primitive kinds of accesses will be: picking the highest-rated
02600	
02700	job, deleting the  lowest-rated one, reordering some  jobs, merging new  ones. A
02800	
02900	natural choice to make these operations efficient is to keep the  system's goals
03000	
03100	in a queue  ordered by their rating  or partially-ordered by those  ratings that
03200	
03300	are commensurable.  For @u[resource information], the usual request is  for some
03400	
03500	@i[statistic]  of some  class of  primary  data.  To  maintain a  table  of such
03600	
03700	summary facts (like how much the CPU clock has run so far, or how  many concepts
03800	
03900	there  are) is  to introduce  an unnecessary  DS and  incur exorbitant  costs to
04000	
04100	maintain many @i[short-lived] entries  that will, most probably, never  be used.
04200	
04300	It is far more  reasonable to run a  summarizing procedure to develop  just that
04400	
04500	ephemeral, up-to-date information that you need.  For @u[math concepts], we have
04600	
04700	a much less volatile situation. We view them as an ever-growing body  of highly-
04800	
04900	interrelated facts. Knowledge  in this form is  stable and rarely  deleted. When
05000	
05100	new knowledge is added,  a great many "routine"  inferences must be drawn.  In a
     

00100	Lenat & Harris                                                            p.  21
00200	
00300	
00400	uniform, linear memory, each would have to be drawn explicitly; in  a structured
00500	
00600	one (as the Genl/Spec graph structure provides) they may be accomplished through
00700	
00800	the tacit (analogical) characteristics of the representation, simply by deciding
00900	
01000	@i[where] to place the information.
01100	
01200	
01300	Each  kind  of  knowledge  dictates a  set  of  appropriate  kinds  of primitive
01400	
01500	operations to be performed on it, which in turn suggest natural  data structures
01600	
01700	in which to realize it. The generality of this perspective on rule-based systems
01800	
01900	is made  more plausible by  examining other  RSs which deal  with many  types of
02000	
02100	knowledge (e.g., [5]).  If this is so, if the design proceeds from "knowledge to
02200	
02300	be represented" to "a data structure  to hold it", then fixing @i[a  priori] the
02400	
02500	capabilities of the DS access primitives available to rules is suspect.
02600	
02700	
02800	Therefore, we advocate the opposite: the RS designer is encouraged to name every
02900	
03000	combination of "machine" operations  that together comprise a  single conceptual
03100	
03200	access of data by rules.  In AM, it is quite reasonable to expect that a request
03300	
03400	like "find  all generalizations of  a given concept"  would be such  a primitive
03500	
03600	(i.e., could be referred to by name).  Even though it might cause  the "machine"
03700	
03800	(in this case, LISP) to run around the Genl/Spec graph, a single rule  can treat
03900	
04000	this as merely an "access" operation.   The use of complex tests and  actions is
04100	
04200	not new; we simply claim  that it is @i[always] preferable to  package knowledge
04300	
04400	(for which a reasonably fast algorithm is available) as a single  action (though
04500	
04600	it may have side-effects in the space of concepts) or a single test (so  long as
04700	
04800	its sole  side-effect -- modulo  caches -- is  to signal).  Primitive  tests and
04900	
05000	actions should be maximally algorithmic, not minimally computational.
     

00100	Lenat & Harris                                                            p.  22
00200	
00300	
00400	The  neo-classical  view of  designing  a  production rule  system  was  that of
00500	
00600	defining a machine. Our present view  is that RSs do not @i[compute] so  much as
00700	
00800	they  @i[guide  attention].   In  adopting  this  view  (thereby  separating the
00900	
01000	controller from the effector), we recognize that we are giving up  an attractive
01100	
01200	feature of pure rule systems: a homogeneous basis for definition.   For example,
01300	
01400	the rule system designer must now spell out in detail the definitions of  the DS
01500	
01600	accessing functions; but  the designer of a  neo-classical RS is simply  able to
01700	
01800	take as @i[givens] the matching  and inserting operations (as specified  in neo-
01900	
02000	classical principle #6, Figure 1),  and he builds each more complicated  one out
02100	
02200	                   @r[2]
02300	of these primitives     .  In  giving up the old view  of the RS as  an abstract
02400	
02500	computing machine, the RS designer must use another homogeneous substrate (e.g.,
02600	
02700	LISP) in terms  of which to  define his DSs  and especially the  procedures that
02800	
02900	process them.  In exchange, he obtains a clear distinction between two  kinds of
03000	
03100	knowledge contained in the  neo-classical rule: plausible proposals for  what to
03200	
03300	do next, and how to accomplish what might be proposed.
03400	
03500	
03600	We have seen that admitting complicated and varied DSs leads to stylized sets of
03700	
03800	DS accesses. The  DSs and their  sets of read/write  primitives must in  turn be
03900	
04000	explicitly defined (coded) by the designer. This seems like a high price to pay.
04100	
04200	Is there  any bright side  to this? Yes,  one rather interesting  possibility is
04300	
04400	opened up. Not only the RS designer, but the RS @i[itself] may define DSs and DS
04500	
04600	access functions.  In AM, this  might take the form of dynamically  defining new
04700	
04800	
04900	---------------
05000	2
05100	   Either by stringing out a sequence of primitives on one side of a rule, or by
05200	handcrafting a  tightly coupled  bundle of rules  (so firing  such a  rule would
05300	simulate traversing one link of the kind that abound in AM's DSs).
     

00100	Lenat & Harris                                                            p.  23
00200	
00300	
00400	kinds of facets (slots). E.g., after "injective Function" is defined,  and after
00500	
00600	some properties of it have been discovered, it would be appropriate to introduce
00700	
00800	a  new  facet called  "inverse"  for each  (concept  representing  an) injective
00900	
01000	function.  In  AM, the  actual definitions of  the facets  of every  concept are
01100	
01200	complex enough  (shared structure), inter-related  enough (shared  meaning), and
01300	
01400	interesting  enough  (consistent heuristic  worth)  that a  special  concept was
01500	
01600	included for  each one  (e.g., a  concept called  "Examples") which  contained a
01700	
01800	definition,  description,...  of  the  facet.   Thus  the  same  techniques  for
01900	
02000	manipulating and discovering math concepts may be applied to DS design concepts.
02100	
02200	Not only do  math theories emerge,  so can new  DS access functions  (new slots;
02300	
02400	e.g., "Small Boundary Examples", "Factorization", or "Inverse").
02500	
02600	
02700	It should be noted  that in opting for non-uniform  DSs, we have not  in general
02800	
02900	                      [%%100]
03000	sacrificed efficiency.        One has only to compare the time to access  a node
03100	
03200	in a tree, versus in a linear list, to appreciate that efficiency may,  in fact,
03300	
03400	be @i[increased] by non-uniformity.
03500	
03600	
03700	Just  how  tangled  up a  DS  should  we tolerate?   Should  memory  elements be
03800	
03900	permitted to refer  to (to "know  about") each other?  We believe the  answer to
04000	
04100	depend upon  the @i[type]  of data  structure involved.  For the  homogeneous DS
04200	
04300	called  for  in  the  neo-classical  design,  much  simplicity  is  preserved by
04400	
04500	forbidding this kind of interrelationship. But consider a DS like AM's  graph of
04600	
04700	concepts. It is growing, analogically interrelated, and it contains descriptions
04800	
04900	of its elements. This richness (and sheer quantity) of information can  be coded
05000	
05100	only  inefficiently  in  a  uniform,  non-self-referential  manner.  For another
     

00100	Lenat & Harris                                                            p.  24
00200	
00300	
00400	example, consider AM's agenda  of jobs. One reason for  a job may simply  be the
00500	
00600	existence of some other  job. In such a case,  it seems natural for part  of one
00700	
00800	entry on the agenda (a reason part of one job) to point to another entry  in the
00900	
01000	same DS  (point to  another specific  job on  the agenda).   Thus, inter-element
01100	
01200	pointers @i[are] allowed, even though  they blur a "pure" distinction  between a
01300	
01400	                     @r[3]
01500	DS  and its  entries.       Inter-element references  play a  necessary  role in
01600	
01700	organizing  large  bodies  of highly  interrelated  information  into structured
01800	
01900	modules.
02000	
02100	
02200	There is yet another motivation for special-purpose DSs when the task of  the RS
02300	
02400	includes  sensing an  external environment.   Using a  uniform  memory, external
02500	
02600	stimuli are dumped into the working memory and rub shoulders with all  the other
02700	
02800	data.  They  must then  be distinguished  from the  others.  ("Must"  because to
02900	
03000	freely intermingle what one sees or is told with what one thinks or remembers is
03100	
03200	to give way to endless confusion.) How much cleaner, less distracting, and safer
03300	
03400	it is for  stimuli to arive in  their own special place  -- a place  which might
03500	
03600	well be a special purpose store such as an intensity @i[array] (not even  a list
03700	
03800	structure at  all), or  a low-level speech-segment  @i[queue].  A  linear memory
03900	
04000	(e.g.,  an infinite  STM) is  of course  adequate; one  could tag  each incoming
04100	
04200	environmental stimulus with  a special flag.  But  the design philosophy  we are
04300	
04400	proposing  is aimed  at  maximizing clarity  and efficiency,  not  uniformity or
04500	
04600	universality.
04700	
04800	
04900	
05000	
05100	---------------
05200	3
05300	   In section 4.3 we will mention work that blurs this distinction even further.
     

00100	Lenat & Harris                                                            p.  25
00200	
00300	
00400	We know that this view of DSs means making a specialized design effort  for each
00500	
00600	class of knowledge incorporated into the RS.  But that is desirable, as  it buys
00700	
00800	us three things: @i[(i)] system performance is increased, @i[(ii)] some forms of
00900	
01000	automatic learning are facilitated, @i[(iii)] knowledge is easier to encode.
01100	
01200	
01300	           [%%225]
01400	4.2.  Rules
01500	
01600	
01700	In the "pure" view of  RSs, the rule store is  not a full-fledged DS of  the RS.
01800	
01900	For  example,  in  Waterman's  [24] poker  player,  rules  may  not  be deleted.
02000	
02100	Rychener [22] states that the only way his RS may inspect rules is  by examining
02200	
02300	the  effect  of those  rules  which have  recently  fired.  Although  AM  had no
02400	
02500	explicit taboo against  inspecting rules, such  analyses were in  practice never
02600	
02700	possible, since the rules were  @i[ad hoc] blocks of LISP code.  This eventually
02800	
02900	turned  out  to be  the  main limitation  of  the design  of  AM.   The ultimate
03000	
03100	impediment to further discovery was the lack of rules which could  reason about,
03200	
03300	modify,  delete, and  synthesize  other rules.  AM direly  needed  to synthesize
03400	
03500	specialized forms of the given  general heuristic rules (as new  concepts arose;
03600	
03700	see the end of 3.5.)
03800	
03900	
04000	We  want  our  heuristic rules  to  be  added, kept  track  of,  reasoned about,
04100	
04200	modified,  deleted,  generalized, specialized,  ...   whenever there  is  a good
04300	
04400	reason to do so.  Note that those situations may be very different from the ones
04500	
04600	in which  such a  rule might fire.   E.g., upon  discovering a  new, interesting
04700	
04800	concept, AM should try to create some specially-tailored heuristic rules for it.
04900	
05000	They  wouldn't  actually  @i[fire]  until  much  later,  when  their  lhs's were
     

00100	Lenat & Harris                                                            p.  26
00200	
00300	
00400	triggered.   After  having constructed  such  rules, AM  might  subject  them to
00500	
00600	criticism and improvement as it explores the new concept.
00700	
00800	
00900	In sum, we have found that  the discovery of heuristic rules for using  new math
01000	
01100	concepts is a necessary part of the growth of math knowledge.   Hence, following
01200	
01300	the argument in 4.1, the rules themselves should be DSs, and each rule  might be
01400	
01500	described by  a concept  with effective  (executable) and  non-effective (purely
01600	
01700	descriptive) facets.  This lesson was  made all the more painful because  it was
01800	
01900	not new [5].  Apparently  the need for reasoning  about rules is common  to many
02000	
02100	tasks.
02200	
02300	
02400	The  current re-coding  of  AM does  in fact  have  each rule  represented  as a
02500	
02600	concept. What kinds of non-effective "facets" do they have?  Recall that  one of
02700	
02800	the features of the original AM (as described in Section 3.3) was that with each
02900	
03000	rule were associated some  symbolic @i[reasons] which it could  provide whenever
03100	
03200	it proposed a new job for the agenda. So one kind of facet which every  rule can
03300	
03400	possess is "Reasons". What others are there?  Some of them @i[describe] the rule
03500	
03600	(e.g., its average cost); some facets  provide a road map to the space  of rules
03700	
03800	(e.g., which  rule schemata  are mere  specializations of  the given  one); some
03900	
04000	facets record its derivation (e.g., the rule was proposed as an analog to rule X
04100	
04200	because ...), its  redundancy (some other  rules need not  be tried if  this one
04300	
04400	is), etc.
04500	
04600	
04700	There are some far-reaching consequences of the need to reason about  rules just
04800	
04900	as if they  were any other  concepts known to AM.   When one piece  of knowledge
     

00100	Lenat & Harris                                                            p.  27
00200	
00300	
00400	                                                                         [%%245]
00500	relates to several  rules, then one  general concept, a  rule @i[schema],
00600	
00700	should exist to hold that common knowledge.  Since each rule is a concept, there
00800	
00900	will be a natural urge to exploit the same Genl/Spec organization that proved so
01000	
01100	useful before. Heritability still holds; e.g., any reason which explains  rule R
01200	
01300	is also somehow a partial explanation of each specialization of R.
01400	
01500	
01600	Rule schemata  have cause to  exist simply because  they generalize --  and hold
01700	
01800	much  information which  would otherwise  have to  be duplicated  in  -- several
01900	
02000	specific rules.  They may  tend to be  "big" and  less directly  productive when
02100	
02200	executed,  yet they  are of  value  in capturing  the essence  of  the discovery
02300	
02400	           @r[4]
02500	techniques.      We put  "big" in quotes because  sheer length (total  number of
02600	
02700	lhs  tests allowed,  total number  of rhs  actions) is  not directly  what we're
02800	
02900	talking about here. A general  rule schema will capture many  regularities, will
03000	
03100	express an  idea common to  several more specific  rules.  It will  contain dual
03200	
03300	forms  of  the  same  rule, sophisticated  types  of  variable-binding  (for the
03400	
03500	duration of the  rule application), and searching  may even be required  to find
03600	
03700	the actions of such a general rule.  We may even wish to consider every  rule in
03800	
03900	the RS as a rule schema of some level of generality, and much processing  may go
04000	
04100	on to  find the  particular instance(s)  of it  which should  be applied  in any
04200	
04300	particular situation.
04400	
04500	
04600	Let  us  consider   a  rule  schema  called   the  "rule  of   enthusiasm".   It
04700	
04800	
04900	---------------
05000	4
05100	   In  AM, even the  specific rules may  be "big" in  the sense that  their very
05200	precise knowledge may involve much  testing to trigger and, once  triggered, may
05300	conclude some elaborate results.
     

00100	Lenat & Harris                                                            p.  28
00200	
00300	
00400	        [%%63]
00500	subsumes       several  rules in  the original  AM system  (pp. 247-8  of [13]),
00600	
00700	e.g., those that said:
00800	
00900	        If  concept  G  is  now  very  interesting,  and  G  was  created  as  a
01000	                generalization of some earlier concept C,
01100	        Give extra  consideration to  generalizing G, and  to generalizing  C in
01200	                other ways.
01300	
01400	
01500	and:
01600	
01700	        If  concept  S  proved  to  be  a  dead-end,  and  S  was  created  as a
01800	                specialization of some earlier concept C,
01900	        Give  less consideration  to specializing  S, and  to specializing  C in
02000	                other ways in the future.
02100	
02200	
02300	The proposed rule schema is:
02400	
02500	        If concept X  has very @u[high/low] interest  and X can be  derived from
02600	                some concept C by means @u[m],
02700	        Give @u[more/less] consideration  to finding (and  elaborating) concepts
02800	                derived from C, X (and their "neighbors") by  @u[means analogous
02900	                      [%%15]
03000	                to m].
03100	
03200	
03300	                        [%%268]                               [%%39]
03400	There are four variables        to  be matched and coordinated       in  the lhs
03500	
03600	of  this rule:  a concept  @u[X], the  direction (high  or low)  of  its extreme
03700	
03800	interest rating, a derivation  procedure @u[m] and an associated  source concept
03900	
04000	@u[C].  The action itself is to search for jobs of a certain type and  give them
04100	
04200	a  corresponding (high  or  low) rating  change.   Three types  of  matching are
04300	
04400	present: @i[(i)] ranging over a set of alternatives which are known at  the time
04500	
04600	the rule is written (e.g., the "high/low" alternative); @i[(ii)] ranging  over a
04700	
04800	set of alternatives which can be accessed easily at any moment the rule  is run,
04900	
05000	like the set  of concepts and connections  between them now in  existence (e.g.,
     

00100	Lenat & Harris                                                            p.  29
00200	
00300	
00400	the variables X and C range over this kind of set); @i[(iii)] ranging over a set
00500	
00600	of alternatives  which must be  heuristically searched for  as part of  the rule
00700	
00800	execution (e.g., "analogous" and "neighbors" only make sense after  a nontrivial
00900	
01000	amount of searching has been performed).
01100	
01200	
01300	Since the "rule of enthusiasm" is very general, it will only be tried if no more
01400	
01500	specific rules (such as the two which were listed just above it) are relevant at
01600	
01700	the  time.  Ideally,  the search  to  specify the  action should  create  a new,
01800	
01900	specialized form of the rule of enthusiasm to catch this situation and handle it
02000	
02100	quickly, should it arise again.  Note that versions of this schema  that mention
02200	
02300	generalization or  specialization are also  schemata (without  any specification
02400	
02500	search);  they are  simply less  general schemata  than the  rule  of enthusiasm
02600	
02700	itself.  Whenever a new subject for discovery gets defined, the  abstract, hard-
02800	
02900	                                                                [%%240]
03000	to-execute rule  schemata can be  specialized (compiled, refined         , etc.)
03100	
03200	into efficient heuristics for that subject.
03300	
03400	
03500	Another use of a rule schema might be to @i[name] a collection  of neo-classical
03600	
03700	rules that  are coupled by  together fulfilling a  single function.   Consider a
03800	
03900	collection of rules which is tightly coupled, say to perform an  iteration. Much
04000	
04100	knowledge  about  the  iteration  loop  as a  whole  may  exist.  Where  is such
04200	
04300	descriptive information to  be stored and sought?  Either it must  be duplicated
04400	
04500	for each of the coupled rules, or there must be a rule-like concept which "knows
04600	
04700	about"  the iteration  as one  coherent  unit.  We  conclude that  even  if some
04800	
04900	intertwined rules @i[are] kept separate,  an extra rule (a schema)  should exist
05000	
05100	which  (at  least implicitly)  has  a  rhs which  combines  them  (by containing
     

00100	Lenat & Harris                                                            p.  30
00200	
00300	
00400	knowledge common to all  of them).  Thus rule  schemata do more than  just unify
00500	
00600	general properties of rules: there must also be schemata of the kind that relate
00700	
00800	function to mechanism.
00900	
01000	
01100	Another problem crops up if we consider what happens if one of the coupled rules
01200	
01300	is  modified.  Often,  some  corresponding change  should  be  made  in  all its
01400	
01500	companions. For  example, if a  term is generalized  (replacement of  "prime" by
01600	
01700	"number" everywhere) then the same  substitution had probably better be  done in
01800	
01900	each rule  with which this  one is supposed  to couple.  What  we are  saying is
02000	
02100	that, for RSs which  modify their own rules, it  can be dangerous to split  up a
02200	
02300	single conceptual process into a bunch  of rules which interact in more  or less
02400	
02500	fixed ways when  run, without continuing to  reason about them as  an integrity,
02600	
02700	@i[like any other algorithm] composed of parts.  Here again, we find pressure to
02800	
02900	treat RSs as algorithms, not vice-versa.
03000	
03100	
03200	Finally,  let us  make a  few irresistable  observations.  The  whole  notion of
03300	
03400	coupling via meaningless tokens is aesthetically repugnant and quite contrary to
03500	
03600	"pure" production  system spirit. By  "meaningless" we mean  entries in  DS that
03700	
03800	provide  a narrow  hand-crafted channel  of communication  between  two specific
03900	
04000	                                             @r[5]
04100	rules that therefore "know about each other".      At the least, when  a coupled
04200	
04300	rule deposits  some "intermediate-state" message  in a DS,  one would  like that
04400	
04500	message to be meaningful to many rules in the system, to have  some significance
04600	
04700	itself. We can  see that entries in  a DS have an  expected meaning to  the read
04800	
04900	
05000	---------------
05100	5
05200	  By contrast, a "meaningful" DS entry will embody a piece of  information which
05300	is specific to the RS's task, not to the actual rules themselves.
     

00100	Lenat & Harris                                                            p.  31
00200	
00300	
00400	                                      @r[6]
00500	access functions that  examine the DS.      If  this purity is  maintained, then
00600	
00700	any apparent "coupling" would be merely superficial: each rule could stand alone
00800	
00900	as a whole domain-dependent heuristic. Thus no harm should come from  changing a
01000	
01100	single  rule,  and more  rules  could be  added  that act  on  the "intermediate
01200	
01300	message"  of  the  coupling.   Such  meaningful,  dynamic  couplings  should  be
01400	
01500	encouraged. Only the meaningless, tight couplings are being criticized here.
01600	
01700	
01800	4.3.  Distribution of Knowledge Between Rules and DS
01900	
02000	
02100	A  common  "pure"  idea  is  that  all  knowledge  of  substance  ought   to  be
02200	
02300	           [%%165]
02400	represented         as  rules.   Independent  of such  rules,  the  DS  forms no
02500	
02600	meaningful whole initially,  nor has it  any final interpretation.  The "answer"
02700	
02800	which the RS computes  is not stored in the  DS; rather, the answer  consists in
02900	
03000	                             @r[7]
03100	the process of  rule firings.      The DS  is "just" an intermediate  vehicle of
03200	
03300	control information.
03400	
03500	
03600	Contrary to this, we say that rules ought to have a @i[  symbiotic] relationship
03700	
03800	to DSs.  The DSs hold meaningful domain-dependent information, and rules process
03900	
04000	knowledge represented in them.  For RSs designed to perform scientific research,
04100	
04200	the DSs contain the theory, and the rules contain methods of theory formation.
04300	
04400	
04500	---------------
04600	6
04700	   Perhaps this "meaning" could even be expressed formally as an invariant which
04800	the write access functions for the DS must never violate.
04900	
05000	
05100	7
05200	    The sequence  of actions  in time.  In addition,  perhaps, the  "answer" may
05300	involve a few of their side-effects. E.g., (Respond 'YES').
     

00100	Lenat & Harris                                                            p.  32
00200	
00300	
00400	But  much domain-dependent  knowledge is  conditional.  E.g.,  "If n  and  m are
00500	
00600	relatively  prime  and  divide  x, then  so  must  nm".  Shouldn't  such If/Then
00700	
00800	information be encoded as rules? We answer an emphatic @i[No].  Just as there is
00900	
01000	a distribution  of "all knowledge  of substance" between  rules and DSs,  so too
01100	
01200	must  the  conditional  information  be  partitioned  between  them.   We  shall
01300	
01400	illustrate two particular issues: (i) Much information can be  stored implicitly
01500	
01600	in DSs; (ii) Some conditional knowledge is inappropriate to store as rules.
01700	
01800	
01900	When designing a  DS, it is  possible to provide  mechanisms for holding  a vast
02000	
02100	amount of information @i[implicitly].  In AM, e.g., the organization of concepts
02200	
02300	into a Genl/Spec hierarchy  (plus the assumed heritability properties;  see 3.4)
02400	
02500	permits a rule  to ask for  "all concepts more general  than Primes" as  if that
02600	
02700	were  a  piece  of  data  explicitly  stored  in  a  DS.  In  fact,  only direct
02800	
02900	generalizations  are  stored   ("The  immediate  generalization  of   Primes  is
03000	
03100	Numbers"), and a  "rippling" mechanism automatically runs  up the Genl  links to
03200	
03300	assemble  a complete  answer. Thus  the number  of specific  answers the  DS can
03400	
03500	provide is  far greater than  the number  of individual items  in the  DS. True,
03600	
03700	these DS mechanisms will use up  extra time in processing to obtain  the answer;
03800	
03900	this is efficient since any particular request is very unlikely to be made. Just
04000	
04100	as each rule knows  about a general situation, of  which it will only see  a few
04200	
04300	instances,  that  same quality  (of  wide potential  applicability)  is  just as
04400	
04500	valuable  for knowledge  in DSs.   These are  situations where,  like Dijkstra's
04600	
04700	multiplier  [8], the  mechanism  must provide  any  of the  consequences  of its
04800	
04900	knowledge quickly on  demand, but in  its lifetime will only  be asked a  few of
05000	
05100	them.
     

00100	Lenat & Harris                                                            p.  33
00200	
00300	
00400	Now that we have seen how tacit information @i[can] be encoded into DSs,  let us
00500	
00600	see some cases where  it @i[should] be -- i.e.,  where it is not  appropriate to
00700	
00800	encode it as rules of the system.  Many things get called implication,  and only
00900	
01000	some of  them correspond to  rule application.  For  instance, there  is logical
01100	
01200	                                                            [%%52]
01300	entailment (e.g.,  if A  and B then  A), physical  causation       (e.g.,  if it
01400	
01500	rains, then the ground will get wet), probable associations (e.g., if it  is wet
01600	
01700	underfoot, then it  has probably been raining.)  These all describe the  way the
01800	
01900	world is, not the  way the perceiver of  the world behaves.  Contrast  them with
02000	
02100	knowledge of the form "If it is raining, then open the umbrella".  We claim that
02200	
02300	this last kind of situation-action  relationship should be encoded as  rules for
02400	
02500	the RS, but that the  other types of implication should be  stored declaratively
02600	
02700	within the DS. Let's try to justify this distinction.
02800	
02900	
03000	The situation-action rules indicate imperatively how to behave in the world; the
03100	
03200	other types of implication merely indicate expected relationships and tendencies
03300	
03400	within the world.  The rules of a RS are meant to indicate  potential procedural
03500	
03600	actions which are obeyed by the system, while the DSs indicate the way the world
03700	
03800	(the RSs environment) behaves in terms of some model of it.  The essential thing
03900	
04000	to consider is what relations are to be @i[caused in time;] these are the things
04100	
04200	we should cast as  rules.  The lhs of a  rule measures some aspect  of knowledge
04300	
04400	presently in DSs, while the rhs of the rule defines the attention of  the system
04500	
04600	(regarded as a processor feeding off of the DS) in the immediate future.
04700	
04800	
04900	This is  the heart  of why rule-sets  are algorithms.   They are  algorithms for
05000	
05100	guiding the application of  other (DS processing) algorithms.  It  also explains
     

00100	Lenat & Harris                                                            p.  34
00200	
00300	
00400	why other  kinds of  implications are  unsuitable to  be rules.  Consider causal
00500	
00600	implication  ("Raining -->  Wet").  While  the  lhs could  be a  rule's  lhs (it
00700	
00800	measures an aspect of any situation), the rhs should @i[not] be a rule's rhs (it
00900	
01000	                                                                @r[8]
01100	does not indicate an appropriate action for the system to take).
01200	
01300	
01400	Most purist production systems have  (often implicitly!) a rule of the  form "If
01500	
01600	the left side of an implication  is true in the database, Then assert  the right
01700	
01800	side".  This is  only  one kind  of rule,  of  course, capable  of  dealing with
01900	
02000	implications.   For  example,  MYCIN  and LT  [17]  (implicitly)  follow  a very
02100	
02200	different rule: "If the rhs of an implication will satisfy my goal, Then the lhs
02300	
02400	                                          [%%30]
02500	of the implication  is now the new  goal".       Other rules are  possible; many
02600	
02700	rules for reasoning may feed off the same "table" of world knowledge.  The point
02800	
02900	is that the  implications themselves are  declarative knowledge, not  rules.  In
03000	
03100	summary, then, it may be very important to distinguish rules  (attention guides)
03200	
03300	from mere implications (access guides), and to store the latter within  the DSs.
03400	
03500	This policy was not  motivated by the scientific  inference task for our  RS. We
03600	
03700	believe it to be a worthwhile guideline in the design of @i[any] RS.
03800	
03900	
04000	
04100	
04200	
04300	
04400	---------------
04500	8
04600	  In a RS that  aspires to any generality at  all, an antecedent theorem  of the
04700	form "if [you know that] it is raining, then [assert that] it is wet" is not the
04800	appropriate form to  store this knowledge; it  is too compiled a  form, standing
04900	alone. If "told" (or given) a  rule like this, a learning system  should "parse"
05000	it as a familiar kind of deduction, file the residue of new information  away as
05100	a  conjectured  tendency of  wetness  to  follow rain,  and  start  checking for
05200	exceptions. A sophisticated (and  lucky) discovery RS might thereby  develop the
05300	concept of "shelter".
     

00100	Lenat & Harris                                                            p.  35
00200	
00300	
00400	4.4.  Interpreter
00500	
00600	
00700	                                                    [%%159]
00800	After a rule fires, the neo-classical interpretation        policy (#9 in Figure
00900	
01000	1) demands  that @i[any]  rule in  the system  can potentially  be the  next one
01100	
01200	selected to fire. This is true regardless of the speed-up techniques used in any
01300	
01400	particular implementation (say, by preprocessing the lhs's into a discrimination
01500	
01600	net [22]).  But consider RSs for scientific discovery tasks. Their task  -- both
01700	
01800	at the  top level  and frequently  at lower  levels --  is quite  open-ended. If
01900	
02000	twenty rules trigger as relevant to such an open-ended activity (e.g., gathering
02100	
02200	                                     [%%144]
02300	empirical data, inducing conjectures,        etc.) then there is much motivation
02400	
02500	for continuing to  execute just these  twenty rules for  a while.  They  form an
02600	
02700	@i[ad hoc] plausible search algorithm for the agenda item selected.
02800	
02900	
03000	A RS for discovery might reasonably be given a complex  interpreter (rule-firing
03100	
03200	policy). AM,  for example,  experimented with a  two-pass interpreter:  first, a
03300	
03400	best-first, agenda-driven resource allocator and attention focusser  selects the
03500	
03600	job it  finds most  interesting; second, it  locates the  set of  relevant rules
03700	
03800	(typically about  30 to 40  rules) for  the job, and  begins executing  them one
03900	
04000	after another (in best-first order of specificity) until the resources allocated
04100	
04200	in the first step run out [20].  The overall rating of the job which these rules
04300	
04400	are to satisfy determines the amount of cpu time and list cells that may be used
04500	
04600	up before the rules are interrupted and job is abandoned.
04700	
04800	
04900	For example, say the job were  "Find examples of Primes".  It's allotted  35 cpu
05000	
05100	seconds and 300 list  cells, due to its  overall priority rating just  before it
     

00100	Lenat & Harris                                                            p.  36
00200	
00300	
00400	was plucked from the agenda.  Say, 24 rules are relevant. The first  one quickly
00500	
00600	finds that "2" and  "3" are primes. Should the  job halt right then? No,  not if
00700	
00800	the real reason for  this job is to gather  as much data as possible,  data from
00900	
01000	which conjectures will be suggested and tested. In that case, many of  the other
01100	
01200	23 rules  should be  fired as well.  They will  produce not  only @i[additional]
01300	
01400	examples, but perhaps other @i[types] of examples.
01500	
01600	
01700	The  jobs on  AM's  agenda are  really  just mini-research  questions  which are
01800	
01900	plausible to  spend time investigating.  Although phrased as  specific requests,
02000	
02100	each one is really a research proposal, a topic to concentrate upon. We found it
02200	
02300	necessary to deviate from the simplest uniform interpreter for clarity  (e.g., a
02400	
02500	human can follow the first-pass  (job selection) taken alone and can  follow the
02600	
02700	second-pass  (job execution)  by itself),  for efficiency  (knowing that  all 24
02800	
02900	rules are  relevant, there  is no  need to find  them 35  times), and  for power
03000	
03100	(applying  qualitatively  different  kinds  of  rules  yields  various  types of
03200	
03300	examples).  We claim this quality  of open-endedness will recur in any  RS whose
03400	
03500	task is free concept exploration. This includes all scientific discovery but not
03600	
03700	                         [%%150]
03800	all scientific inference.
03900	
04000	
04100	5.  Speculations for a New Discovery System
04200	
04300	
04400	The spirit of this paper has  been to give up straightforward simplicity  in RSs
04500	
04600	for clarity,  efficiency, and power.  Several examples have  been cited,  but we
04700	
04800	speculate that there are further tradeoffs of this kind which are  applicable to
04900	
05000	RSs whose purpose is to make new discoveries.
     

00100	Lenat & Harris                                                            p.  37
00200	
00300	
00400	Often, there are several  possible ways the designer  may view the task  of (and
00500	
00600	subtasks of) the intended RS. We wish  to add the notion of "proof" to  AM, say.
00700	
00800	Should we represent proof as a resolution search, as a process of  criticism and
00900	
01000	improvement [11] spiralling toward  a solution, as a natural  deduction cascade,
01100	
01200	...? Although any one of these task-views might perform respectably, we advocate
01300	
01400	the  incorporation of  all  of them,  despite  the concommitant  costs  of added
01500	
01600	processing time, space, and interfacing.  In fact, we wish never to  exclude the
01700	
01800	possibility of the system acquiring another task-view.
01900	
02000	
02100	We look for the  development of further discovery  tools in the form  of domain-
02200	
02300	independent meta-heuristics that synthesize heuristic rules, and in the  form of
02400	
02500	        [%%1]
02600	abstract      heuristic schemata that  specialize into efficient rules  for each
02700	
02800	newly-discovered  domain.   These  discovery  tools  are  all  part  of "getting
02900	
03000	familiar" with shallowly understood  concepts, such as synthesized ones  tend to
03100	
03200	be initially.  It  may even be that  symbolic analogy techniques  exist, cutting
03300	
03400	across the traditional boundaries of knowledge domains.
03500	
03600	
03700	We contemplate  a system  that keeps  track of  (and has  methods with  which it
03800	
03900	attempts to improve) the design of  its own DSs, its own control  structure, and
04000	
04100	perhaps even its own design constraints.  Although working in (a  collection of)
04200	
04300	specific domains, this would be a general @i[symbol system  discoverer], capable
04400	
04500	of picking up and exploring formulations, testing them and improving them.
04600	
04700	
04800	5.1.  A New Set of Design Constraints
04900	
05000	
05100	Below are  13 principles for  designing a  RS whose task  is that  of scientific
     

00100	Lenat & Harris                                                            p.  38
00200	
00300	
00400	                  [%%150]
00500	theory  formation.         They are  the  result of  reconsidering  the original
00600	
00700	principles (Figure  1) in  the light  shed by work  on AM.   Most of  the "pure"
00800	
00900	principles  we mentioned  in  Figure 1  are changed,  and  a few  new  ones have
01000	
01100	emerged.
01200	
01300	
01400	                                                            [%%23]
01500	              FIGURE 3: Scientific Discovery RS Architecture
01600	
01700	       @r[1. ]Principle of Several Appropriate Memories.  For each  type of
01800	         knowledge which must be dealt  with in its own way, a  separate DS
01900	         should  be maintained.  The precise  nature of  each DS  should be
02000	         chosen  so as  to  facilitate the  access  (read/write) operations
02100	         which will be most commonly requested of it.
02200	
02300	       @r[2. ]Principle of  Maximal DS Accesses.   The set of  primitive DS
02400	         access operations  (i.e., the  read tests which  a rule's  lhs may
02500	         perform,  and the  write actions  which a  rhs may  call  for) are
02600	         chosen to include  the largest packages (clusters,  chunks,...) of
02700	         activity  which are  commonly needed  and which  can  be performed
02800	         efficiently on the DS.
02900	
03000	       @r[3. ]Principle  of Facetted  DS Elements.   For  ever-growing data
03100	         structures,  there  is  much  to  be  gained  and  little  lost by
03200	         permitting parts  of one DS  item to point  to other DS  items. In
03300	         particular,  schematic  techniques  of  representing   content  by
03400	         structure are now possible.
03500	
03600	       @r[4. ]Principle of Rules as  Data.  The view which the  RS designer
03700	         takes of the system's task may require that some rules  be capable
03800	         of reasoning about the rules in the RS (adding new  ones, deleting
03900	         old ones, keeping track of rules' performance,  modifying existing
04000	         rules,...).  Some  of  the  methods  the  RS  uses  to  deal  with
04100	         scientific knowledge  may be applicable  to dealing with  rules as
04200	         well.  In such  cases, the  system's rules  may thus  be naturally
04300	         represented  as new  entries in  the existing  DS which  holds the
04400	         scientific theory.
04500	
04600	       @r[5. ]Principle of Regularities Among Rules.  Each rule is actually
04700	                                                      [%%64]
04800	         a rule  @u[schema]. Sophisticated  processing       may  be needed
04900	         both to determine which  instance(s) are relevant and to  find the
05000	         precise  sequence of  actions to  be executed.  Such  schemata are
05100	         often quite elaborate.
     

00100	Lenat & Harris                                                            p.  39
00200	
00300	
00400	       @r[6. ]Principle of  Avoiding Meaninglessly-Coupled  Rules.  Passing
00500	         special-purpose loop control notes  back and forth is  contrary to
00600	         both the spirit  of pure RSs and  to efficiency.  If rules  are to
00700	         behave as  coupled, the  least we  demand is  that the  notes they
00800	         write and  read for each  other be meaningful  entries in  DS (any
00900	         other rule may interpret the same note, and other rules might have
01000	         written one identical to it).
01100	
01200	       @r[7. ]Principle of  Controlled Environment. For  many tasks,  it is
01300	         detrimental to  permit external stimuli  (from an  environment) to
01400	         enter any DS at  random.  At the least,  the RS should be  able to
01500	         distinguish  these  alien  inputs  from   internally-generated  DS
01600	         entries.
01700	
01800	       @r[8. ]Principle  of Tacit  Knowledge.   In designing  the  DS, much
01900	         knowledge may be stored  @u[implicitly]; e.g., by where  facts are
02000	         placed in a hierarchical network.  The DS should be designed so as
02100	         to  maximize  this kind  of  concentrated,  analogical information
02200	         storage.   Hence,  hard-working  access  functions  are  needed to
02300	         encode and decode the full meaning of DSs.
02400	
02500	       @r[9. ]Principle  of  Named   Algorithms.   When  basic,   "how  to"
02600	         knowledge is available, it should be packaged as an  operation and
02700	         used as a part of the lhs or rhs of various rules.  Embodying this
02800	         chunk of  knowledge as several  coupled rules is  not recommended,
02900	         for we  will want to  manipulate and utilize  this knowledge  as a
03000	         whole.
03100	
03200	       @r[10. ]Principle of Rules as Attention Guides.  Knowledge should be
03300	         encoded as rules when  it is intended to  serve as a guide  of the
03400	         system's  attention;  to  direct  its  behavior.   Other  kinds of
03500	         information,  even  if  stated  in  conditional  form,  should  be
03600	         relegated to DSs (either  explicitly as entries, or  implicitly as
03700	         special access functions).
03800	
03900	                                                    [%%77]
04000	       @r[11. ]Principle  of  Inertial  Interpreter.        In  tasks  like
04100	         scientific  research,  where  relevant  rules  will  be performing
04200	         inherently  open-ended  activities  (e.g.,  data-gathering),  such
04300	         rules should be  allowed to continue for  a while even  after they
04400	         have nominally carried out the activity (e.g., gathered  one piece
04500	         of data).  In such cases, the occasional wasted time and  space is
04600	         more than compensated for by the frequent acquisition  of valuable
04700	         knowledge  that  was   concentrated  in  the  later   rules.   For
04800	         scientific  discovery,  no  single  rule  (however  "appropriate")
04900	         should be taken as sufficient: a single rule must necessarily view
05000	         the task in just one  particular way.  All views of the  task have
05100	                                                                    [%%221]
05200	         something to contribute; hence variety depends on  a policy
05300	         of always applying several rules.
     

00100	Lenat & Harris                                                            p.  40
00200	
00300	
00400	       @r[12. ]Principle  of  Openness.   A discovery  rule  system  can be
00500	         enriched  by  incorporating into  its  design  several independent
00600	         views  of the  knowledge it  handles. Never  assume  everything is
00700	         known about a class of knowledge.  All appropriate formulations of
00800	         a  knowledge class  have  something to  contribute;  hence variety
00900	         depends on openness to new formulations.
01000	
01100	       @r[13. ]Principle   of   Support  of   Discovery   by   Design.   By
01200	         representing its own design explicitly (say, as concepts),  the RS
01300	         could study and improve those concepts, thereby  improving itself.
01400	                                    @r[9]
01500	         This includes the DS design     , the access  function algorithms,
01600	         how  to   couple  them,  the   function  of  various   rules,  the
01700	         interpretation  policy of  the RS,  etc.  This  suggests  that the
01800	         study of designs of computational mechanisms may be a  worthy area
01900	         for  a  discovery system  to  pursue, whether  its  own  design is
02000	         available to it or not.
02100	
02200	
02300	
02400	Rule systems whose designs adhere to these guidelines will be  large, elaborate,
02500	
02600	and  non-classical.  We  have   mentioned  throughout  the  paper   several  new
02700	
02800	complications which the principles introduce.  Trying to produce such a RS for a
02900	
03000	task for which a pure, neo-classical production rule system was appropriate will
03100	
03200	probably result in disaster. Nevertheless, empirical evidence suggests  that RSs
03300	
03400	having  this  architecture are  quite  natural --  and  relatively  tractable to
03500	
03600	construct -- for open-ended tasks like scientific discovery.
03700	
03800	
03900	
04000	                              @i[ACKNOWLEDGEMENT]
04100	
04200	This research builds upon Lenat's  Ph.D.  thesis at Stanford University,  and he
04300	wishes  to deeply  thank  his advisers  and committee  members:  Bruce Buchanan,
04400	Edward Feigenbaum, Cordell Green, Donald Knuth, and Allen Newell.   In addition,
04500	he gladly  acknowledges the ideas  he received in  discussions with  Dan Bobrow,
04600	Avra Cohn, and Randy Davis.  Similarly, ideas received by Harris in two long and
04700	fruitful  associations,  with  John  Seely Brown  and  with  Roger  Schank, have
04800	contributed to this work.  Many of our ideas have evolved through discussions at
04900	
05000	---------------
05100	9
05200	   e.g., the facet specifications. If the input/output requirements  change with
05300	time, so should the rule system's data structures.
     

00100	Lenat & Harris                                                            p.  41
00200	
00300	
00400	CMU this past year, notably with Don Cohen, John McDermott,  Kamesh Ramakrishna,
00500	Paul Rosenbloom, James Saxe, and especially Herbert Simon.
00600	
00700	
00800	
00900	                                 @i[REFERENCES]
01000	
01100	[0]   Bledsoe, W. W., and Tyson, M., @i[The UT Interactive  Prover], University
01200	      of Texas at Austin, Depts. of Mathematics and Computer  Sciences Automatic
01300	      Theorem Proving Project Report No. 17, May, 1975.
01400	[1]   Bobrow,  D.,  "Natural  Language  Input  for  a  Computer  Problem Solving
01500	      System", in (Minsky, M., editor), @i[Semantic Information Processing], The
01600	      MIT Press, Cambridge, Massachusetts, 1968.
01700	[2]   Bobrow,  D.,  and  Winograd,  T.,  @i[An  Overview  of  KRL,  A  Knowledge
01800	      Representation  Language],  Journal of  Cognitive  Science, Vol  1,  No 1,
01900	      January 1977.
02000	[3]   Bobrow, R., and Brown,  J. S., "Systematic Understanding, in  (Bobrow, D.,
02100	      and  Collins, A.,  eds.), @i[Representation  and  Understanding], Academic
02200	      Press, S.F., 1975.
02300	[4]   Buchanan,  Bruce  G.,  G.   Sutherland,  and  E.  Feigenbaum, @i[Heuristic
02400	      Dendral:  A  Program  for  Generating  Explanatory  Hypotheses  in Organic
02500	      Chemistry], in (Meltzer and Michie, eds.) Machine Intelligence 4, American
02600	      Elsevier Pub., N.Y., 1969, pp. 209-254.
02700	[5]   Buchanan,  Bruce  G.,   @i[Applications  of  Artificial   Intelligence  to
02800	      Scientific Reasoning], Second USA-Japan Computer Conference, Tokyo, August
02900	      26-28.  Published by AFIPS and IPSJ, Tokyo, 1975, pp. 189-194.
03000	[6]   Davis,   Randall,  @i[Applications   of  Meta   Level  Knowledge   to  the
03100	      Construction, Maintenance,  and Use of  Large Knowledge Bases],  SAIL AIM-
03200	      271, Artificial Intelligence Laboratory, Stanford University, July, 1976.
03300	[7]   Davis, R.,  and King,  J., @i[An overview  of production  systems], Report
03400	      STAN-CS-75-524, Memo AIM-271, Stanford U. CS Department, 1975.
03500	[8]   Dijkstra, E. W., "Notes on Structured Programming", in Dahl, Dijkstra, and
03600	      Hoare, @i[Structured Programming], Academic Press, London, 1972, pp. 1-82.
03700	[9]   Hayes-Roth, Frederick, and  Victor R. Lesser,  @i[Focus of Attention  in a
03800	      Distributed Speech Understanding  System], Computer Science Dept.   Memo ,
03900	      Carnegie Mellon University, Pittsburgh, Pa., January 12, 1976.
04000	[10]   Hewitt,  Carl,  @i[Viewing  Control  Structures  as  Patterns  of Passing
04100	      Messages], MIT AI Lab Working Paper 92, April, 1976.
04200	[11]   Lakatos, Imre, @i[Proofs and Refutations], Cambridge U. Press, 1976.
04300	[12]   Lenat,  D.,  @i[BEINGs:  Knowledge as  Interacting  Experts],  4th IJCAI,
04400	      Tbilisi, Georgian SSR, USSR, 1975.
04500	[13]   Lenat, D.,  @i[AM: An  Artificial Intelligence  Approach to  Discovery in
04600	      Mathematics as  Heuristic Search],  SAIL AIM-286,  Artificial Intelligence
04700	      Laboratory, Stanford University,  July, 1976.  Jointly issued  as Computer
04800	      Science Dept. Report No. STAN-CS-76-570.
04900	[14]   McCracken, Don, @i[A  Parallel Production System Architecture  for Speech
05000	      Understanding], CMU CS Dept. Ph.D. Thesis, 1977.
05100	[15]   Minsky, Marvin,  "A Framework for  Representing Knowledge",  in (Winston,
05200	      P., ed.), @i[The Psychology  of Computer Vision], McGraw Hill,  N.Y. 1975,
05300	      pp. 211-277.
     

00100	Lenat & Harris                                                            p.  42
00200	
00300	
00400	[16]   Moran,   T.P.,   @i[The  symbolic   imagery   hypothesis:   An  empirical
00500	      investigation via a  production system simulation  of human behavior  in a
00600	      visualization task], CMU CS Dept. Thesis, 1973.  See also 3IJCAI, pp. 472-
00700	      477.
00800	[17]   Newell, Allen, J.  Shaw, and H.  Simon, @i[Empirical Explorations  of the
00900	      Logic Theory Machine:  A Case Study in  Heuristics], RAND Corp.  Report P-
01000	      951, March, 1957.
01100	[18]   Newell, Allen, and  Simon, Herbert, @i[Human Problem  Solving], Prentice-
01200	      Hall, Englewood Cliffs, New Jersey, 1972.
01300	[19]   Newell, A.,  @i[Production Systems: Models  of Control  Structures], May,
01400	      1973  CMU  Report,  also   published  in  (W.G.   Chase,   ed.)  @i[Visual
01500	      Information Processing], NY: Academic Press, Chapter 10, pp.  463-526.
01600	[20]   Norman,  D.,  and  D.  Bobrow,  @i[On  Data-limited  and Resource-limited
01700	      Processes], Journal of Cognitive Psychology, Volume 7, 1975, pp.  44-64.
01800	[21]   Polya,  George,   @i[Mathematics  and  Plausible   Reasoning],  Princeton
01900	      University Press, Princeton, Vol. 1, 1954; Vol. 2, 1954.
02000	[22]   Rychener,  M. D.,  @i[Production systems  as a  programming  language for
02100	      artificial  intelligence applications].   Pittsburgh,  Pa: Carnegie-Mellon
02200	      University, Department of Computer Science.  1976.
02300	[23]   Shortliffe, E. H., @i[MYCIN -- A rule-based computer program for advising
02400	      physicians regarding  antimicrobial therapy  selection], Stanford  AI Memo
02500	      251, October, 1974.
02600	[24]   Waterman, D. A., @i[Adaptive Production Systems], CIP Working  Paper 285,
02700	      CMU Psychology Dept., 1974. See also 4IJCAI, pp. 296-303.
02800	[25]   Waterman, D. A., @i[Generalization Learning Techniques for Automating the
02900	      Learning of Heuristics], AI Journal (forthcoming)